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Abstract We introduce performance-based regularization (PBR) , a new approach to addressing 
estimation risk in data-driven optimization, to mean-CVaR portfolio optimization. We assume the 
available log-return data is iid, and detail the approach for two cases: nonpar ametric and paramet- 
ric (the log-return distribution belongs in the elliptical family). The nonparametric PBR method 
penalizes portfolios with large variability in mean and CVaR estimations. The parametric PBR 
method solves the empirical Markowitz problem instead of the empirical mean-CVaR problem, as 
the solutions of the Markowitz and mean-CVaR problems are equivalent when the log-return distri- 
bution is elliptical. We derive the asymptotic behavior of the nonparametric PBR solution, which 
leads to insight into the effect of penalization, and justification of the parametric PBR method. We 
also show via simulations that the PBR methods produce efficient frontiers that are, on average, 
closer to the population efficient frontier than the empirical approach to the mean-CVaR problem, 
with less variability. 
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1 Introduction 



In recent years, there has been a growing interest in Conditional Value-at-Risk (CVaR) as a 
financial risk measure. This interest is based on two key advantages of CVaR over Value-at-Risk 
(VaR), the risk measure of choice in the financial industry over the last twenty years. Firstly, 
CVaR{(3), the conditional expectation of losses in the top 100(1 — /3)% (/3 = 0.95,0.99 are typical 
values used in industry), is more informative about the tail end of the loss distribution than VaR{f3), 
which is only the threshold for losses in the top 100(1 — f3)%. Secondly, CVaR satisfies the four 
coherence axioms of Artzner, Delbaen, Eber and Heath (1999) [Acerbi and Tasche (2001)], whereas 
VaR fails the subadditivity requirement. 

Portfolio optimization with CVaR as a risk measure is first studied by Rockafellar and Uryasev 
(2000), who show that empirical CVaR minimization can be formulated as a linear program. Subse- 
quent works include CVaR optimization for a portfolio of credit instruments [Andersson, Mausser, 
Rosen and Uryasev (2001)] and derivatives [Alexander, Coleman and Li (2006)], and portfolio op- 
timization based on extensions of CVaR [Mansini, Ogryczak and Speranza (2007)]. However, most 
discussions of CVaR in portfolio optimization to date are concerned with formulation and tractabil- 
ity of the problem, and assume full knowledge of the distribution of the portfolio loss. In practice, 
one cannot ignore the fact that the loss distribution is not known and must be estimated from 
historical data, constructed from expert knowledge, or a combination of both. Naive estimation 
of the loss distribution can pose serious problems — Lim, Shanthikumar and Vahn (2011) demon- 
strates how fragile the solution to the empirical mean-CVaR problem is, even in the ideal situation 
of having iid Gaussian log-return data. 

The issue of estimation errors in portfolio optimization is not, however, new knowledge. The 
estimation issue for the classical Markowitz (mean-variance) problem has been raised as early as 
1980 [Jobson and Korkie (1980)]. There have since been many suggestions for mitigating this issue 
for the Markowitz problem; two main approaches are robust optimization [Goldfarb and Iyengar 
(2003)] and what we call "standard regularization" [Chopra (1993), Frost and Savarino (1988), 
Jagannathan and Ma (2003), DeMiguel, Garlappi, Nogales and Uppal (2009)]. The robust opti- 
mization approach is to take the source of uncertainty (e.g. the asset log-returns, or its distribution), 
specify an uncertainty set about the source, and minimize the worst-case return-risk problem over 
this uncertainty set. The standard regularization approach is to solve the empirical mean- variance 
problem, but with a constraint on the size of the solution, as measured by L2 or a more gener- 
alized norm. The term "regularization" is adopted from statistics and machine learning, where 
it refers to controlling for the size of the decision variable for better out-of-sample performance 
[Hastie, Tibshirani and Friedman (2009)]. Both robust optimization and standard regularization 
approaches have been studied for the mean-CVaR problem; Gotoh and Shinozaki (2010) and Zhu 
and Pukushima (2009) show implementations of the robust optimization approach when the source 
of uncertainty is, respectively, the log-return vector and the log-return distribution, and Gotoh and 
Takeda (2010) demonstrates implementation of standard regularization. 

In this paper, we propose performance-based regularization (PER), a new approach to address- 
ing estimation risk in data-driven optimization, and illustrate this method for the mean-CVaR 
portfolio optimization problem. We demonstrate PBR for two situations: the investor has non- 
parametric or parametric (specifically, the elliptical family of distributions describe the log-returns) 
information on the log-returns. 

The nonparametric PBR method penalizes portfolios with large variability in mean and CVaR 
estimations. Specifically, we penalize the sample variances of the mean and CVaR estimators. The 
resulting problem is a combinatorial optimization problem, however we show that its convex relax- 
ation, a quadratically-constrained quadratic program, is tight. The problem can be interpreted as a 
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chance-constrained program that picks portfohos for which approximate probabihties of deviations 
of the mean and CVaR estimations from their true values are constrained. 

The parametric PBR method solves the empirical Markowitz problem instead of the empirical 
mean-CVaR problem if the underlying log-return distribution is in the elliptical family (which 
includes Gaussian and t distributions). This is based on the observation that CVaR of a portfolio 
is a weighted sum of the portfolio mean and the portfolio variance if the log-return distribution 
is in the elliptical family, resulting in the equivalence of the population efficient frontier^ of the 
Markowitz and mean-CVaR problems. As we are striving to reach the population frontier with 
greater stability, it makes intuitive sense to use the empirical Markowitz solution in lieu of the 
empirical mean-CVaR solution for this model. 

The PBR methods are anticipated to enhance the performance by yielding solutions that are, 
on average, closer to achieving the original objective (minimize the true CVaR subject to true 
return equal to some level). As such, the PBR approach is fundamentally different from robust 
optimization, in that robust optimization deals with the source of uncertainty to minimize the worst- 
case performance, whereas PBR deals with the performance uncertainty to increase the average 
performance. Comparing to the statistics/machine learning literature, PBR for the nonparametric 
case can be seen as an extension of standard regularization, in that nonparametric PBR also 
constrains the decision variable, however does so indirectly through penalizing the variability of 
mean and CVaR estimations. 

Details of the nonparametric PBR method can be found in Sec. 13.11 and the parametric PBR 
method in Sec. 13.21 In Sec. [U we provide theoretical results for the PBR methods after deriving 
the Central Limit Theorem for the nonparametric PBR solution. In Sec. [5l we evaluate the PBR 
methods against the straight-forward approach of solving the empirical mean-CVaR problem for 
three different log-return models via simulation experiments. We find that on average, the sample 
efficient frontiers of the PBR solutions are closer to the population efficient frontier than those of 
the straight-forward approach. 

2 Mean-CVaR portfolio optimization 

p 

Notations. Throughout the paper, we denote convergence in probability by — )• and in distri- 
bution by =^. The notation X = Y for two random variables X and Y means they have the same 
distribution, and the symbol X ~ D is used to indicate that the random variable X follows some 
standard distribution D. 

2.1 Setup 

An investor is to choose a portfolio w £ on p different assets. Her wealth is normalized to 
1, so w'^lp = 1, where Ip denotes p x 1 vector of ones. The log-returns of the p assets is denoted 
by X, a p X 1 random vector, which follows some absolutely continuous distribution F with twice 
continuously differentiable pdf and finite mean ^ and covariance S. The investor wants to pick a 
portfolio that minimizes the CVaR of the portfolio loss at level 100(1 — /3)%, for some /3 G (0.5, 1), 
while reaching an expected return R. That is, she wants to solve the following problem: 

wq = argmin CVaR{—w~^ X; (3) 

w 

s.t. w'^fi = R (CVaR-pop) 

w'^lp = 1, 

^By "population" we mean having a perfect market knowledge. 
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where 



CVaR{-w'^ X; (3) := mm a + -^£(--0; ' X - (1) 

a 1 — P 

as in Rockafellar and Uryasev (2000). 

In reality, the investor does not know the distribution F. We assume the investor observes n iid 
reahzations of asset returns, X = [Xi, . . . ,Xn] S M^^". Then the most straight-forward thing is to 
solve the following problem, where plugged-in estimators replace the true CVaR and return values: 

Wn = argmin CVaRn{—w~^X; (3) 

s.t. w'^fLn = R (CVaR-emp) 



w'^lp = 1 



where 



CVaRni-w'^X; /3) := min a + — Y{-w'^Xi - a)+ , (2) 

a6K n[l — p) ^—^ 

is a sample average estimator for CVaR{—vJ^ X; (3) and fin = 'n~^ Y^=i -^i sample mean of 

the observed asset log-returns. 

2.2 Estimation risk of the empirical solution 

Asymptotically, as the number of observations n goes to infinity (with p constant), Wn converges 
in probability to wq [see Sec. l4.2l for details]. In practice, however, the investor has a limited number 
of relevant observations. If, for example, there are n = 250 iid daily observations, and the investor 
wishes to control the top 5% of the losses, then there are only 250 x 0.05 = 12.5 points to estimate 
the portfolio CVaR at level /3 = 0.95. For stock log-returns, n = 250 iid daily observations is rather 
optimistic; there is ample empirical evidence that suggests daily log-returns are non-stationary over 
this period of time [McNeil, Prey and Embrechts (2005)]. Even for time scales with more evidence 
for stationarity (e.g. bi- weekly /montly), the stationarity tends to last for no more than 5 years 
[McNeil et al. (2005)]. 



As a result, solving ( |CVaR-empP using real data results in highly unreliable solutions. Let 
us illustrate this point, assuming an ideal market scenario. There are p = 10 stocks, with daily 
returns following a Gaussian distributiori: X ~ Af{fJ,sim,^sim), and the investor has n = 250 iid 
observations of X. In the following, we conduct an experiment similar to those found in Lim et al. 



(2011), to evaluate the performance and reliability of solving (CVaR-emp) under this ideal scenario. 
Briefly, the experimental procedure is as follows: 

• Simulate 250 historical observations hom J\^{fIsim,'^s^m)■ 



• Solve (CVaR-emp) with /3 = 0.95 and some return level R to find an instance of Wn- 



Plot the realized return w^fi versus realized risk CVaR{—w^X;P); this corresponds to one 
grey point in Pig. ([1]). 

Repeat for different values of R to obtain a sample efficient frontier. 
Repeat many times to get a distribution of the sample efficient frontier. 



^the parameters are the sample mean and covariance matrix of data from 500 daily returns of 10 different US 
stocks from Jan 2009- Jan 2011 
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The result of the experiment is summarized in Fig. ([T]). The green curve corresponds to the pop- 
ulation efficient frontier. Each of the grey dots corresponds to a solution instance of ( |CVaR-emp[ ). 
There are two noteworthy observations: the solutions Wn are sub-optimal, and they are highly 
variable. For instance, for a daily return of 0.1%, the CVaR ranges from 1.3% to 4%. 




1 1 1 1 1 1 1 1 1 1 1 1 

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 

CVaR (%/clay) 



Figure 1: Distribution of realized daily return (%) vs. daily risk (%) of empirical solution w. Green line 
represent the population frontier, i.e. the efficient frontier corresponding to solving ( |CVaR-pop[ ). 

In the following section, we introduce performance-based regularization (PBR) as an approach 
to improve upon ( |CVaR-emp ). The PBR approach is so-called because its goal is to improve upon 



Wn in terms of its performance, i.e. closeness to the population efficient frontier, ideally with 
less variability. We describe PBR for two cases: the investor has nonparametric or parametric 
knowledge of the market. 



3 Performance- based regularization 



3.1 Nonparametric case 

In the nonparametric case, we assume the asset log-returns X follows some distribution P with 
finite mean fi and covariance S, and the investor has n iid observations: X = [Xi, . . . ,Xn] G M^^". 



The nonparametric PBR approach to ( CVaR-pop ) is to solve the following problem: 



mm 

w 



(7V^„(-u;TX;/3) 



S.t. W fin = R 

W^lp = 1 

Pi{w) < Ui 
P2H < U2 



(3) 



where Pi and P2 are penalty functions that characterize the uncertainty associated with w'^ fin and 
CVaRn{—w~^X; (3) respectively. The idea is to penalize decisions w for which the uncertainty about 
the true values n and CVaR{-w'^X]f3) is large. 

What, then, are appropriate penalty functions? Recall that we are trying to find solutions 
that yield efficient frontiers that are closer to the population efficient frontier, ideally with smaller 
variability. Thus the variances of fin and CVaRn{—w^Y.] /3) make appropriate penalty functions. 
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as they characterize the deviation from the respective population values. The variance of jin is 
given by 

^ n ^ 
Varivo^ jin) = —^ry^Varivo^Xi) = —w^T.w, 



— ' n 

and the variance of CV aRn(^—w^ \\ /3) is approximately equal to 7g/n(l — = Var[niax{—w~^ X — 
a^)], where 

a/B = inf{a : P{-w'^X > a) < 1 - /?} 
is the Value-at-Risk (VaR) of the portfolio w at level /?, due to the following lemma. 

Lemma 1. Suppose X = [Xi, . . . *~ F, where F is absolutely continuous with twice continu- 
ously differentiahle pdf. Then 

~ ^\ cVaRni-w'^^; /?) - CVaRi-w'^X; /?)) AA(0, 1). (4) 
70 

Proof. See Appendix Rl □ 

Of course, we do not know the true variances, so we contend with sample variances of the 
estimators w'^ fin and CVaRn{—w~^X; (3). That is, we consider the following penalty functions: 

Pi(w) = —w^TinW, where S„ = Cov{X), 
n 

P^iw) = -—^—-yz'^QnZ, where 
n{l — pY 

= [/„ — n~^\n^]) In = n X n identity matrix, and 

n — 1 

Zi = max(0, —w'^Xi — a) for i = 1, . . . , n. 

For the rest of this paper, we investigate the nonparametric PER method with sample variance 
penalty functions. Of course, this is just one particular choice, and it opens up the question of 
how different penalty functions affect the solution performance, and whether there are such things 
as "optimal" penalty functions. These are difficult questions worthy of further research, and we 
do not investigate them in this paper. Nevertheless, we derive the asymptotic behavior of the 
solution of nonparametric PER method in Sec. HI which gives us some insight into i) how one 
could compare the effects of different penalty functions and ii) the first-order effect of many typical 
penalty functions. 

The nonparametric PER method with sample variance of return and CVaR estimators as penal- 
ties is: 

n 



{al,w^,z^) = argmin a+ o-^ J2^i 



a,w,z n{l - (3) 

S.t. fin = R 

W^lp = 1 

-w'tnw < Ui (CVaR-pen) 

z'^QnZ < U2 



n(l -/3)2 
m.ax{0, —w~^ Xi — a), i= l,...,n 
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At first glance, (CVaR-pen) is a combinatorial optimization problem due to the cutoff variables 
Zi, i = 1, . . . ,n. However, it turns out that the convex relaxation of ( |CVaR-pen[ ), a quadratically- 
constrained quadratic program (QCQP), is tight, thus we can solve ( CVaR-pen ) efficiently. Before 
stating the result, let us first introduce the convex relaxation of (CVaR-pen): 



1 

min a + — ^ V Zi 

a,w,z n[l — p) ^—^ 



S.t. fin 


= R 


(^1 




= 1 




— W l^nW 


< Ui 


(Al 






1 " 

n(i-/3)2^ 


< U2 


(A2 


> 


i=l,.. 




Zi > —Vij^Xi — a, 


i=l,.. 





(CVaR-relax) 



and its dual (where the dual variables correspond to the primal constraints as indicated above): 

max 9{i'i,V2,-r]i,r]2,\iA2) 

s.t. r/Jl„ = 1 (CVaR-relax-d) 

Al > 0,A2 > 
771 >0,ry2 >0 

where 

g(l/l,l/2,Al, A2,r/i,7?2) = --^(^lAn + ^2lp - X7?2)'^S;;^(j^l/in + l^2lp - Xr?2 

-{rii + ri2)^Vli{r]i + 7/2) + R^i + V2 - UiXi - U2X2, 



n 

n(l-/3)2 



2A2 

and r^n, is the Moore-Penrose pseudo inverse of the singular matrix fi^. 

We now show ( |CVaR-pen ) can be solved efficiently by its convex relaxation: 

Theorem 1. Let (a*, tw*, z*. A*, A2, ?7*, 772) primal-dual optimal point of liCVaR-relai^) and 

iCVaR-relax-d^) . If 'r]2 ^n/iT', then {a* ,w* , z*) is an optimal point of (CVaR-pen). Otherwise, if 
V2 ~ ^n/iT', we can find the optimal solution to liCVaR-relaj^) by solving ^CVaR-relax-d\) with an 
additional constraint rij In > (5, where 6 is a constant < 5 <^ 1. 

Proof. See Appendix [Bj □ 

Remark 1 — Bias introduced by penalty functions. 

Note that if the penalties induce active constraints (i.e. Ui,U2 are small enough), Wn does not 
converge to "Wq a-s it, — )• 00, i.e. the penalty constraints introduce bias. This is not a problem, 
however, because we are concerned with finite sample performance, not asymptotic consistency. In 
Sec. [5l we see that the bias introduced by the penalized solution is actually in the direction that 
improves performance in the return-risk space. 

Remark 2 — Interpretation as chance-programming. 
Both fin and CVaRn{—w~^X; (3) are asymptotically normally distributed, so constraining their 
variances results in the reduction of the corresponding confidence intervals at some fixed level e. 
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Hence penalizing their variances can be interpreted as chance-programming [Charnes, Cooper and 
Symonds (1958)]. Analytically, the chance constraint on fin — can be transformed to a 

penalty constraint in the following manner: 

> 1 - e 

> 1 — e for large n 
- («^-i(l-6/2)) ■ 

That is, for a fixed level e, there is a one-to-one mapping between the parameter Ui of the penalty 
constraint vi^Yinw/n < Ui and the parameter t of the chance constraint. The (asymptotic) variance 
penalty on CVaRn{—w^'^':fi) has a similar interpretation as a chance constraint. 

However, the penalty method can be interpreted as chance programming only if we choose the 
variance of the respective estimators as the penalty functions. Although we focus on the sample 
variance penalty function in this paper, we assert that the penalty method need not be restricted 
to this particular choice. 



P ( \lV^ [in — ^\ < t 



2$ 



—w Hw 
n 



3.2 Parametric case 



In the parametric case, we assume the asset log-returns follow an elliptical distribution; i.e. 
the level sets of the distribution density function form ellipsoids. An elliptical distribution has a 
stochastic representation as follows [see Anderson (1958) or Muirhead (1982)]: 



X = n + YT}l'^U 



(5) 



where /i is the mean vector, f/ is a p x 1 random vector uniformly distributed on the p-dimensional 
sphere of radius 1 (i.e. U = Zp/\\Zp\\2, Zp ~ A/'(0,/p)), and y is a non-negative random variable 
independent of U. A special case is the Gaussian model: choosing Y = Xp, we get X ~ A/'(/x, S). 
The elliptical family of distributions can thus be thought of as a generalization of the Gaussian 
family, and may be more reasonable for financial modeling because the non-random mixing of 
covariances can capture non-trivial tail dependence and heavier tails [McNeil et al. (2005)]. In 
particular, ^-distributions also belong in the elliptical family. 



The parametric PBR method is to solve the empirical Markowitz problem instead of ( CVaR-emp ) 
if X belongs in the elliptical family: 



M 



= argmm w 

w 

s.t. 



fir, 



R 
1. 



(Mark-emp) 



The method is based on Lemma[2l which shows that the solutions of ( |CVaR-pop ) and the population 
Markowitz problem [which is the same as (Mark-emp) except with (S,/i) replacing (S^,/!^)] are 



equivalent if X is elliptically distributed. Lemma [2] is an extension of results mentioned elsewhere 
[Rockafellar and Uryasev (2000), De Giorgi (2002)] that show the equivalence of the solutions 
of dCVaR-popP and the population Markowitz problem when X is Gaussian. However, to our 



knowledge, the implication that we can solve (Mark-emp) in lieu of (CVaR-emp) to obtain a better- 
performing solution has not been asserted. 
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Lemma 2. Suppose X ~ Ellip{fi,'S,Y) as in ^ and y > 0. Then the solution of the population 
mean-CVaR problem { CVaR-pop^ and the population Markowitz problem are equivalent. 

Proof. The proof is straightforward: we show CVaR{—w~^ X; /3) is a weighted sum of the portfoho 
mean w'^ fi and portfoho std V w~^T,w. 
First, the portfoho loss is: 

L{w) := -w'^X = -w'^fi + Yv'^uVw^^, 



where = w'^T,^^'^ /Vw~^T,w, with | |t;||2 = 1. Before we compute CVaR{—w~^ X; (5) = CVaR{L{w); 
we need to compute a^, the VaR of L{w) at level /3 [equivalently, the (1 — /3)-quantile of L{w)]. 
Since L(w) is a continuous random variable, = — where ^^^-^ is the inverse cdf of 

L{w). Now 

Flm{x) = P{L{w) <x) = P (yv^U > " "^^^ 



VuF^ 



w 



so to compute a/j, we need the distribution of Yv~^U. Since v has norm 1, v'^ Zp = Zi, where 
Zi ~ A/'(0,1), and since U = Zp/\\Zp\\2, 



V U = 



where Xp~i is independent of Zi. Thus {v'^U)'^ ~ Beta{l/2, {p — l)/2), and by the symmetry of 
the normal, we have 

P{Yv^U >x) = P{YI{1/2)Vb > x) , 

where B ~ Beta{l/2, {p — l)/2) and /(1/2) ~ Bernoulli{l/2), independent of the rest. This 
quantity clearly does not depend on our choice of w, hence the solution to the equation 

Fi(^)(x) = l-/3 

is given by 

= -w'^fi + q{l - j3; YI{1/2)Vb)Vw^^, 

where g is a function that does not depend on w, and is unique since L{w) is a continuous random 
variable. 

Thus CVaR at level /3 is given by 

CVaR{L{w);^) = -^—E[L{w)I{L{w)>a,3)] 
1 p 



-T , 



-w n + G{l- (3; YI{1/2)Vb)Vw'^J:w, (6) 

where G does not depend on w. Hence minimizing CVaR{L{w); (3) subject to w'^ = R and 
w~^lp = 1 is equivalent to minimizing w~^TiW subject to the same constraints, which is precisely the 
population Markowitz problem. □ 
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4 Theory 



We have thus far introduced nonparametric and parametric PBR methods to improve upon 



the empirical mean-CVaR problem (CVaR-emp). While we evaluate these methods in Sec. [5] via 



simulation experiments, it is still desirable to obtain some theoretical understanding of Wn, w'^ and 

The solution to the empirical Markowitz problem w^^ has an explicit form and its asymptotic 
behavior has been studied elsewhere [for X ~ AA(^, S), see Jobson and Korkie (1980), and for X ~ 
Elliptical, see El Karoui (2009)]. So we focus on deriving the asymptotic behavior of Wn and w"^ — 
specifically, we show that they follow the Central Limit Theorem (CLT). Application of the delta 
method from classical statistics [see for example. Chapter 3 of Van der Vaart (2000)] then allows 
us to conclude that the corresponding sample efficient frontiers also follow the CLT. From these 
results, we can get some insight into the effect of the penalty functions in the nonparametric PBR 
method, and (indirectly) justify the parametric PBR method when the log-returns are Gaussian. 

Notations. In this section, we make use of stochastic little-o and big-0 notations: for a 

given sequence of random variables X^ = op{Rn) means X^ = YnRn where — t- 0, and 
Xn = Op{Rn) means X„ = where K„ = Op(l), i.e. for every e > there exists a constant 

M such that sup P{\Yn\ > M) < e. 

n 

Measurability Issues. We also encounter quantities that may not be measurable (e.g. supre- 
mum over uncountable families of measurable functions) . We note that whenever the "probability" 
of such quantities are written down, we actually mean the outer probability. For further details, 
see Appendix C of PoUard (1984). 

4.1 Preliminaries 

The quantities Wn and w'^ are solutions to non-trivial optimization problems so they cannot 
be written down analytically, and it seems characterizing their asymptotic distributions would be 
difficult. However, we are not at a complete loss. In statistics, an M-estimatoJl is an estimator 
that minimizes an empirical function of the type 

1 " 

e^Mn{e):=-y^me{X,), (7) 
1=1 

where X\, . . . are iid observations, over some parameter space 0. The solution 0„ is then a 
reasonable estimator of the minimizer Qq of the true mean M{0) = 'E[m0{Xi)]. It is well-known 
that 9n obeys the Central Limit Theorem (i.e. is asymptotically normally distributed) under some 
regularity conditions. Intuitively, assuming 9 is one-dimensional and M„ is sufficiently smooth, the 
CLT result is based on Taylor expansion of the first-order condition dMn{On) / dO = about 

= -^^ = ^^ + ((^n-eo)^^ + Op{\6^-9o\). 



^"M" stands for Minimization (or Maximization). For readers unfamiliar with M-estimation, maximum likelihood 
estimation falls in this category. 



10 



Under reasonable assumptions that d Mn{9o)/d6 obeys the Weak Law of Large Numbers ajiid Oj-i 

p 

is a consistent estimator of (i-e- — ^o| 0), we have 



with the latter expression obeying the standard CLT as it is a normalized sum of iid random 
variables. 

So we ask, can we transform ( CVaR-emp ) and ( CVaR-pen ) to a problem for which we can use 



the M-estimation results? 

The first step towards transforming (|CVaR-empP and (|CVaR-pen[) is to make them into constraint- 



free optimization problems. This is achievable, albeit with some thoughts, and we defer the details 
to Sec. 14. 2i Next, we need to show w„ and are consistent, i.e. they converge in probability to 
the corresponding population solutions. The proof of consistency is also provided in Sec. 14. 2[ 



Once (CVaR-emp) is transformed to a global optimization problem, it is equivalent to an M- 
estimation problem in that the objective is a sample average of iid random variables of the form 
Eq. ([7]). Thus we conclude Wn is asymptotically normally distributed with mean wq and covariance 
matrix S^op, which we can compute. 



However, (CVaR-pen) after transformation into a global problem is not quite an M-estimation 



problem, because, after some algebra, the objective is of the form (see Sec. 14.21 for details): 

where mF •) is a permutation-symmetric function, and the sum is over all possible pairs for 
^ ^ i, j ^ resulting in a sample average of identically distributed but non-independent terms. 

For fixed 9, statistics of the form Eq. ([8]) are known as U-statistics, and we believe the solution 
w"!^ is still well-behaved because U-statistics can be decomposed into a term of the form M^{9) = 
(known as its Hajek projection or first term in its Hoeffding decomposition; see 
Hoeffding (1948)) and a remainder which converges to zero in probability at rate ^/n. Thus we 
intuit that the asymptotic behavior of is equivalent to the minimizer of M^(0), the latter for 
which we can apply the standard M-estimation result. We make this intuition rigorous in Sec. 14.31 
In Sec. 14.41 we provide details of the asymptotic distributions of Wn and wJ^ when X ~ A/'(//, S), 
and provide a justification of the parametric PER method. 

4.2 Consistency of w„ and 

In this subsection we show consistency of w"^ = w'^{Xi, A2). The result goes through for Wn by 
setting Ai = A2 = 0. 

4.2.1 Transformation into global optimization 

The penalized CVaR portfolio optimization problem with dualized mean and sample/asymptotic 
variance penalty constraints is 



min Mn{a,w;Xi,X2) 
{a,w)mxRp _^ (CVaR-dual) 

p 



S.t. W~^lri = 1 
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where 



Mn{e; Ai, A2) = - 5] me{X,) + ^w'^tnW + \ ze{X,) - - J] ze{X,) 
i=i i=i \ j=i J 



(9) 



mg{x) = a + ^ ^ ^ zg{x) - Xqw'^x, (10) 

and Ao > 0, Ai, A2 > are pre-determined constants. 

We dualize the mean constraint w' fin = R because it makes the analysis of the corresponding 
solution much easier. While dualizing the mean constraint adds a sample average of iid terms to 
the objective, leaving it as a constraint results in a solution that has a non-trivial dependence on 
the underlying randomness. 

i-random constraint -Lp 
to re-parameterize w a.s w = wi + Lv, where L = [0(p_i)xi; -^(p-i)x{p-i)]~'') = [''^2; • • • ,Wp]^ and 
wi = [1 — f'''l(p-i), Oix(p-i)]~''. The transformed problem is thus 



Now eliminating the non-random constraint w~^lp = 1 is straight-forward; one possible way is 



min M„(e;Ai,A2), (11) 

e&Rp 

where 6 = {a,v) G M x M^"^ is free of constraints, and the corresponding population problem is 

min MiO; Ai, A2) = E[M„(0; Ai, A2)]. (12) 



In what follows, we assume M{6; Ai, A2) has a unique minimizer 6'o(Ai, A2). We also let 0n(Ai, A2) 
be a near-minimizer of M„(0; Ai, A2), i.e. 

MniOn, Ai, A2) < inf Mn{e; Ai, A2) + op(l). (13) 



4.2.2 Transformation of the objective to a U-statistic 

Let ^ = (a,z;) G M X MP'^ and ze{x) := {—x'^{wi + Lv) — a)~^. With simple algebra, we can 
re- write the objective Eq. ^ U-statistic: 

M„(^;Ai,A2) = -^ m[^,^,^,^)(X„X,), (14) 

V2/ l<i,j<n 

where 

"^(0;Ai,A2)(^»'^i) '■= \ [me{xi) + me{xj)] + ^[{wi + Lv)^ {xi - Xj)f + ^{ze{xi) - ze{xj)f. (15) 



4.2.3 Consistency of ^„(Ai, A2) 

Let us now prove consistency of ^n.(Ai, A2) for fixed Ai, A2 > 0. The intuition behind the proof 
is as follows: if M{6; Ai, A2) is well-behaved such that for every e > there exists rj > such that 
||<9n(Ai,A2) - ^o(Ai,A2)||2 > e =^ Af((9.„; Ai, A2) - M(6'o;Ai,A2) > t], then consistency follows 
from showing that the probability of the event {M(0„; Ai,A2) — M{9o; Xi, X2) > ??} goes to zero 
for all e > 0. In the proof, we show that < M(^„; Ai, A2) - M{6q; Ai, A2) < -(M„(i9„,; Ai, A2) - 
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M{6n', Ai, A2)) + Op (1), hence the result follows by proving Uniform Law of Large Numbers (ULLN) 
for M„,(e;Ai,A2): 

sup |M„(0; Ai, A2) - M{e; Ai, As)! ^ 0. (16) 

ULLN has been extensively studied in the statistics and empirical processes literature and one 
of the standard approaches to showing ULLN is through bracketing numbers. Given two functions 
I, u, the bracket [I, u] is the set of all functions g with I < g < u. An e-bracket in Lr{P) is a bracket 
u] with Ep(n — ly < , and the bracketing number ](e, J^, Lr{P)) is the minimum number of 
e-brackets needed to cover Having a finite bracketing number ](e, L,.(P)) < 00 for every 
e > means one can find a finite approximation to J- with e-accuracy for all e > 0, and ULLN 
holds for such [Theorem 19.4 Van der Vaart (2000)]. 

There are certainly known sufficient conditions for finite bracketing numbers. For our problem, 
if we can replace MP with a compact set, we can show i*" is a Lipschitz class of functions (defined 
in the next paragraph), which is known to have finite j (e, J^, Ly.{P)) for every e > 0. Now for all 
practical purposes, we need only consider a compact subset of 0, [—K, KY where K is appropriately 
large enough, because the elements oi 9 = (a, v) are only meaningful if bounded in size (a is the 
Value-at-Risk of the portfolio w = wi + Lv). Hence for the rest of this section we assume a K 
exists such that 6^ G [-K, K]p for all n and 6*0 G [-K, K]p . 

Definition 1 (Lipschitz class). Consider a class of measurable functions J- = {fe : 9 & Q}, 

fg : X ^ M, under some probability measure P. We say T is a Lipschitz class about 9o £ Q if 
9 I— 7- fo{x) is differentiable at 9q for P-almost every x with derivative fe^ix) and such that, for every 
9i and 92 in a neighborhood of 9q, there exists a measurable function f with E[/^(Xi)] < 00 such 
that 

\f9A^)-f9,ix)\<fix)\\9i-92\\2. 

Example 19.7 of Van der Vaart (2000) shows that if = {fg : 9 S 0} is a class of measurable 
functions with bounded C M'^ and J- is Lipschitz about 9q Q then for every < e < diam{Q), 
there exists C such that 

Ny ] {e^n\f{X)n L2 {P))<C (^^!^!^) ' , (17) 

i.e. has a finite bracketing number for all e > 0. This result is needed in proving consistency in the 
following. 

Theorem 2. For fixed Ai, A2 > 0, let ^^(Ai, A2) be a near-minimizer of Mn{9; Ai, A2) as in Eq. [T^}. 
and let 0o(Ai, A2) be the unique minimizer 0/ M(0; Ai, A2). Also let 

J'i={mg:9e [-K, K]^}, T2 = {m\,.^^ ,^^^ : 9 G Kf}, 

where mg and rn^Q.-^^ are defined in Eqs. < f70|) and I75j) . Suppose the following: 

Assumption 1. 9 ^ M{9; Xi, X2) is continuous and limini^g^^^^ M {9; Xi, X2) > M {9q; Xi , X2) ■ 
Assumption 2. Xi, . . . ,X„ are iid continuous random vectors with finite fourth moment. 

Then 

!!4(Ai,A2)-0o(Ai,A2)!!2 4o. 
Proof. See Appendix lC.il □ 
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4.3 Central Limit Theorem for ^n(Ai,A2) 

We are now ready to show the CLT for On{Xi, X2)- The CLT for 0.„(0, 0) is a straightforward 
apphcation of known M-estimation results for Lipschitz class of objective functions [e.g. Theorem 
5.23 of Van der Vaart (2000)]. 

The CLT for 0n(Ai,A2) when Ai,A2 are not both zero does not follow straight-forwardly from 
M-estimation results because M„(6'; Ai,A2) is a sample average of identically distributed but non- 
independent terms. However, statistics of the form Mn{0; Xi, X2) are known as U-statistics, and 
we can decompose them into a sum of iid random variables and a component which is op{l/y/n) 
[Hoeffding (1948)]: 

1 " 

Mni9; Ai, A2) = - mls.^^ ,^^){Xi) + En{e; Ai, A2), (18) 

i=l 

Where m;,.^^_^^)(X,) = 2Ex, [m[^,.;,^_;,^)(X„ X,)] - Ex^.x^ ^..^^^^^^(Xi, X2)] and i^„(0;Ai,A2) = 

TT ^ P 

op{l/^/n). Hence we suspect (6'„; Ai, A2)| — )■ 0, where 

1 " 

i2^(0; Ai, A2) = V^{e - 60) - [V2^Em;,^,^,,^)(X,)]-i-=^m;,^,^_,^)(X,). 

Now 9n changes with every n so we need uniform probabilistic convergence of i?^(^; Ai, A2), and 
implicitly of En{0; Ai, A2). For this we need to show convergence of particular stochastic processes; 
an empirical process and a U-process. 

Definition 2. Let Xi, . . . , X„ be iid random vectors from X . For a measurable function / : A' — t- M, 
the empirical process at / is 

1 

G,J :=^5^[/(X,)-E/(Xi)], 
V'^ i=i 

and for a measurable function g : X y. X ^M., the U-process at g is 

Un<? ■.= ^Y.yg{X,^Xj)-¥.x,,x,9{XuX2)]. 

To show convergence of quantities such as sup^g^ |X„(t)| for some stochastic process {Xn(t) : 
t € T}, we need to introduce the notion of weak convergence of stochastic processes. If X„(-,a;) is 
a bounded function for every iv £ il, then we can consider Xn{-,u}) to be a point in the function 
space i°°{T), the space of bounded functions on T which is equipped with the supremum norm. 
Hence, showing the convergence of sup^g^^ |Xn(t)| is equivalent to showing weak convergence of Xn 
in this function space. 

Definition 3 (Weak convergence of a stochastic process). A sequence of Xn : f^n '-^ i°°{T) 
converges weakly to a tight random element^ X iff both of the following conditions hold: 

1. Finite approximation: the sequence (X„(ti), . . . , X„(tfc)) converges in distribution in M*^ for 
every finite set of points ti, . . . ,tk in T. 



*A random element is a generalization of a random variable. Let (fl, Q, P) be a probability space and D a metric 
space. Then the (/-measurable map X : f2 i-> D is called a random element. 
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2. Maximal inequality: for every e,rj > there exists a partition of T into finitely many sets 
Ti, . . . ,Tk such that 



limsup P 



sup sup 

i s,teTi 



\Xnis) - Xn{t)\ >e 



< Tj. 



The point at the end of this is, as taking the supremum is a continuous map in the topol- 
ogy of £°°(T), weak convergence of to X{-) would allow us to conclude supjg'p — )• 
supteT \X{t)\. 

Regarding empirical processes, we say a class of measurable functions T is P-Donsker if {G„/ : 
/ G J-} converges weakly to a tight random element in This property is related to the 

bracketing numbers introduced in Sec. 14.21 a class F is P-Donsker if elog[A^[ j (e, J-", L2(-P))] — )• 
as e — 7- [due to Donsker; see Theorem 19.5 of Van der Vaart (2000)]. Many sufficient conditions 
for the weak convergence of {U^/ : / G J-"} are provided in Arcones and Gine (1993), and we make 
use of one in our proof of CLT for ^^(Ai, A2) below. 

Theorem 3. Fix Ai, A2 > 0, Ai, A2 not both zero and assume the same setting as Theorem\^ Also 
let 

"^(eo;Ai,A2)(^) = ^e?"feo;Ai,A2)(^)le=eo(Ai,A2). forxe W, 

and further assume 

Assumption 3. Exi.Xa ["^[^o;Al,A2)("^l' ^2)^] < 00. 

Assumption ^. 1— )• M(^;Ai,A2) admits a second-order Taylor expansion at its point of mini- 
mum ^o(Ai, A2) with nonsingular symmetric second derivative matrix ye^^[\^.\2)■ 
Then 

1 " 

V^(4(Ai,A2)-0o(Ai,A2)) = -^,;(\„A2)7^E^(^o;A„A2)(^0 + Op(l) 

^ i=i 

where 

is the first-order term in the Hoejfding decomposition of Mn{0] Ai, A2). 

Remark — Implication on the choice of penalty functions. 

We have just shown that asymptotically, the sample variance penalty functions affect the solution 
performance only through its Hajek projection. This observation can generalize to many typical 
penalty functions (e.g. different statistics of mean and CVaR estimators), and as such, the impli- 
cation is that of all possible penalty functions to consider, one may focus on a subclass of functions 
that can be expressed as a sample average of iid terms. 

Corollary 1. Assume the same setting as Theorem\^ Then 

^/^(4(Ai,A2)-^o(Ai,A2)) ^AA(0,S,,(Ai,A2)), (19) 
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where Se^(Ai,A2) = A^'^-BA''^, 



0=00 



where 

^0 "^(0;Ai,A2)(2;) 



-00 



j^L'^XI - XqL^X + 2XiL^{X -ii){X- ^i^w + 2X2E[{ze{X) - Ez0{X)){-L^ XI + EL^ XI)] 



1 - + 2X2E[{ze{X) - Eze{X)){-l + E 

andl = l{ze{X) > 0). 
Remarks. 

1. For asymptotics of u'n(Ai,A2) we have 

\/^(i^^n(Ai,A2)-u;o(Ai,A2)) ^AA(0,S^o(Ai,A2)), (20) 
where I;^„o(Al,A2) = (Op L)i;0o(Ai, A2)(0p . 

2. Setting Ai, A2 = 0, we get back the unpenahzed mean-CVaR problem. 

3. Asymptotic distribution of the efficient frontier. 

With Eq. ()20p . we can state the distribution of the true efficient frontier — that is, the distri- 
bution of WniXi,X2y n and g{wniXi, A2)) := CVaR{-WniXi, X2)~^ Xn+i] /3), where Xn+i ~ F, 
independent of Xi . . . , Xn- For the portfoho mean, we have 

^/n(^^;„,(Al, X2Y - wo(Ai, A2)^/u) =^ A/'(0, /j,^i;^o((Ai, A2))/i) 
and for the true CVaR, by the delta Method 
^/7i{g{wn{Xi,X2))-g{wQ{Xi, X2))) ^ M {o, (?'(t(;o(Ai, A2))^S^o(Ai, A2)5'(^«o(Ai, A2))} . (21) 

The asymptotic distribution of g{wn[Xi, A2)) clearly depends on the distribution of the assets 
X. In the case when X ~ EUip{fi,T,,Y), g{'w) = —w^ 11 + G\/ vj^TjW according to our 
previous calculations in Eq. ([6]). Hence 



\^{g{wn) - g{wo)) ^ AA f 0, (^-fi + G- 



''\^J-, + a^^]]. (22) 



Vwo^wqJ ° V VwqEwoJ 

4.4 Example. Asymptotic analysis for X ~ A/^(/U, S) 

In the following, we provide the detailed computation of Sfiip(0,0) for the unpenalized solution 
4(0,0) when X ~7V(//,S). 
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Lemma 3. Suppose X ~ M{fi, S). Then 



Z6)o(X) = -w^ X - ao aoM{-^ H/3),l), and 



where o"o = y WqTjWq. Then Sfii„(0, 0) = Aq^ BqAq^ , where Aq, Bq are symmetric matrices with 

Po 



^o(l,l) 



1-/3 



MjJ) =j^^^nLjXLjX\ze,{X) = 0] for2<j,l<p 
Mhj) =-j:^^E[L]X\ze,iX) = 0] for2<j<p, 



where Lj is the j-th column of L, and 



So(l,l) 



1-/3 

\1{L]i:Li + q^iLj^L) + jy^^ (^-^ + 2Ao) nL]XLlXl{ze, {X) > 0)] /or 2<j,l<p 
Bo{l,j) =0 /or2<i<p. 

Proof. This is a straight-forward appHcation of Corollary [T] for the case X ~ J^ifJ-, S). □ 

Let us now compare the asymptotic results derived above with simulations with finite number 
of observations. Consider 5 assets, a range of observations (n = 250, 500, 1000, 2000) and X ~ 
■^if^sim, '^sim), where the model parameters are the same as the model parameters of the first five 
assets used in Sec. 12.21 For simulations, we solve the mean-CVaR problem with dualized mean 

constraint: 

min CVaRn{—w~^X; (3) — Xqw~^ fin 

w 

S.t. W~^lp = 1, 

and follow steps similar to Sec. 12.21 

In Fig. [21 we summarize the empirical frontiers by plotting their averages and indicating ±1/2 
standard deviation error bars, in both true mean (vertical) and true risk estimations (horizontal) 
in grey. The population frontier is also plotted, and is shown in green, and the theoretical ±1/2 
standard deviations of mean and risk estimations are juxtaposed with the empirical error bars in 
red. We make a couple of observations: 

1. With increasing n, the theoretical error bars approach the simulated ones, as expected. 

2. The theory seems to better predict the mean estimation error (vertical) better than the risk 
estimation error (horizontal). With finite n, the mean estimation error, which is computed 
using Eq. (f2T]) . depends only on one approximate quantity S^g(0, 0), whereas the risk es- 
timation error, computed using Eq. ([22]) . depends on S^q(0,0) and wq. Although Wn is a 
consistent estimator of wq asymptotically, with finite n the difference does play a role, as 
shown by the relative inaccuracy of the horizontal error bars compared to the vertical ones. 
The finite sample bias also explains the gap in the positions of the population and simulated 
efficient frontiers. 
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(a) n = 250 




0.095 



1.78 1.88 1.98 
CVaR (%/day) 
(c) n = 1000 



1 .78 1 .83 
CVaR (%/day) 
(d) n = 2000 



1.93 




-g 0.085 



0.075 



a> 



1 .73 1 .78 

CVaR (%/day) 



1.82 



0.065 



1.68 




1.72 1.74 1.76 
CVaR (%/day) 



Figure 2: Comparison of theoretical (red) and simulated (grey) distributions of the empirical efficient frontier 
when X ^ N{iisim,'^sim) for increasing number of observations n = [250,500, 1000,2000]. The error bars 
indicate ±1/2 std variabilities in the mean and CVaR. Green is the population efficient frontier, and blue 
indicates the portion that corresponds to the return range considered for the simulations. Observe that the 
asymptotic variance calculated theoretically (red bars) approach the simulated variance (grey bars) with 
increasing n. 



Let us now derive asymptotic properties of the penalized solution ^n(Ai, A2), Ai, A2 > 0, when 
X ~ AA(^, S). First, we show that when X ~ N{ix, S), penalizing variance of CVaR estimation is 
redundant if one penalizes the sample variance of the mean. 

Lemma 4. Suppose X ~ A/'(/U, S) and let zg{X) = —a — w'^X. Then zg{X) ~ J\f{ni,a'l) where 
111 = — cJi$^^(/3), ai = w^Hw, and 

l/ar[max(z0(X),O)] = C{fi)al, 

where C{/3) is a constant that only depends on (3. Thus penalizing the sample variance of CVaR 
via P2{w) = Varn[ze[w){^)^^)] ^ U2 is redundant if one penalizes the sample variance of the mean 
via Pi{w) = uJ^TinW = cf\^ < Ui. 

Proof. Straight-forward calculations show 

Var[m^x{ze{X),0)] = + 1)(1 - /3) - 3$-H/3)/zo [^"H/?)]} 

where fzo is the pdf of the standard normal random variable Zq. □ 

The implication now is that when X ~ A/'(/U,S), we need only consider Ai > 0, A2 = to 
characterize the asymptotic properties of the penalized solution, which we describe below. 
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Lemma 5. Suppose X ~ M{fi, S). Then 



where A\, B\ are symmetric matrices with 

Ai{l,l) =^(1,1) 

Ai{j,l) =Aoij,l) + XiLjELi for2<j,l<p 
Ai{l,j) =Ao{l,j) for2<j<p 



where Lj is the j-th column of L, and 
=Bo(l,l) 

Bi{j, I) = Bo{j, I) + AiE[6ojfei,/ + bo^ihj + Xihjbi^i] 
Bi{lJ) =Bo(l,j) + AiE[6o,i&ij] 



for 2< j,l <p 
for 2 < j < p 



where for 2 < j,l < p, 

IE[6o,i&i,/] = 2LiSw; 



1-/3 



E[lJxlJ{X - fi)w^{X - fi)I{ze^{X) > 0)] - 2X^1] ^jlLJt. 



w 



4E[L]{X - n){X - fi^Liw^iX - fi){X - fi^w] 



E[LjiX - t,)w^iX - m^eoiX) > 



1-/3 



Proof. This is a straight-forward application of Corohary [T] for the case X ~ A^(^, S). 



□ 



Remark — Justification of the parametric PBR method. 

The nonparametric PBR method with only a penalty on the mean estimation is a Hnear combi- 
nation of the empirical mean-CVaR problem (CVaR-emp) and the empirical Markowitz problem 
(Mark-emp) because the penalty is precisely the portfolio variance estimate w'^'EnW. In particular, 
this single-penalty problem approaches (Mark-emp) with increasing Ai. In Figure [3l we plot 1 std 
of tt;^(Ai, 0)~''/i and CVaR{—w^X^i,0)~^ X; P) for the single-penalty problem as Ai is increased, for 
different values of Aq, computed using Lemma [5l Observe that the asymptotic standard deviations 
for both portfolio mean and CVaR decrease with increasing Ai, uniformly in Aq. Given that both 
solutions to dCVaR-empP and (Mark-emp) converge to the population solution wq, the asymptotic 
theory deems the empirical Markowitz solution superior. 



5 Numerical results 



In this section, we present simulation results to evaluate the nonparametric and parametric 
PBR methods presented in Sec. [3] against the straight-forward approach (CVaR-emp). We consider 
p = 10 assets and three distributional models for the asset log-returns: X is multivariate Gaussian, 
elliptical and mixture of multivariate Gaussian and negative exponential. For each model, we 
follow the procedure outlined in Section 2 to construct sample efhcient frontiers corresponding to 
(CVaR-emp), (CVaR-pen) and (Mark-emp). 



One question that arises while solving (CVaR-pen) is how one chooses the penalty terms Ui 
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Figure 3: 1 asymptotic std of the portfolio mean and CVaR for the single-penahy problem as Ai is increased 
when X ^ A/'(/i, S), for different values of Xq. 



and Uo in the constraints 



n 

Z~^0,nZ < U2- 



n{l - 



If Ui,U2 are too small, the problem becomes infeasible, whereas if they are too large, the penal- 
ization does not have any effect. It is sensible to choose Ui,U2 as a proportion of w^T,nWn/n 
and — respectively, where {wn, Zn) is the solution to the unpenalized problem 



(CVaR-emp). We denote the proportions ri and r2 respectively. In practice, one would perform 



cross-validation to find values of (ri,r2) G [0, 1] x [0, 1] that maximize out-of-sample performance. 



5.1 Gaussian/elliptical models 

Here we consider 

where A is as in ([5]), with A = 1 for a Gaussian model and A ~ r(3,0.5) for an elliptical model. 
The parameters Hsim and Ssim are the same as those used in Sec. 12.21 We plot the histograms 
for 100, 000 sample returns for an equally-weighted portfolio w = Ip/p under the Gaussian and 
elliptical models in Fig. (jl]). 

We summarize the simulation results in Fig. ([5]), where (ri, = (0.92, 1) for both the Gaussian 
and elliptical models [recall that the second penalty is redundant due to Lemma [2]. Notice that for 
both models, the empirical Markowitz efficient frontier dominates the penalized efficient frontier 
which in turn dominates the empirical mean-CVaR efficient frontier, in both position of the average 
of the simulated frontiers and variability, as indicated by the vertical and horizontal error bars. 

For the Gaussian case, ri = 0.92 was just feasible in that further reduction in this value led to 
most instances of the problem being infeasible. From Fig. ([5})), we can see that this is because the 
penalized solutions are approaching the empirical Markowitz solutions with this choice of ri as the 
average simulated efficient frontiers of penalized (grey) and empirical Markowitz (blue) solutions 
are close. For the elliptical model, ri = 0.92 could be further reduced with the resulting penalized 
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efficient frontier approaching the empirical Markowitz efficient frontier. In summary, the empirical 
Markowitz solutions perform uniformly better than both the original and penalized mean-CVaR 
solutions, with the penalized efficient frontier nearing the empirical Markowitz efficient frontier 
with decreasing ri. 

5.2 Mixture model 

Let us now consider returns being driven by a mixture of multivariate normal and negative ex- 
ponential distributions, such that with a small probability, all assets undergo a perfectly correlated 
exponential-tail loss. Formally, 



where (Hsim, '^sim) are parameters with the same value as in the Gaussian/elliptical models, I{q) ~ 
Bernoulli{q), and / = [/i, . . . , fp]^ is a p x 1 vector of constants, and Y is a negative exponential 
random variable with density 



In our simulations, we consider q = 0.05, fi = fii — i/^m fo^' ^ = Ij • • • a^^id A = 1. The histogram 

for 100, 000 sample returns of an equally-weighted portfolio under this mixture model is shown in 



We summarize the simulation results in Fig. (4b), where (ri,r2) = (0.5,0.5). In this case, 
the penalized efficient frontiers perform better on average than the efficient frontiers generated by 
the other two methods. The empirical Markowitz efficient frontiers do not seem to perform any 
better than the original efficient frontiers on average, which is not surprising because the empirical 
Markowitz solution is only intended for X having an elliptical distribution. 

6 Conclusion 

We investigate Performance-Based Regularization as a method to reduce estimation risk in 
empirical mean-CVaR portfolio optimization. The nonparametric PBR method solves the empirical 
mean-CVaR problem with penalties on the uncertainties in mean and CVaR estimations. The 
parametric PBR method solves the empirical Markowitz problem instead if the underlying model 
is elliptically distributed. Both theoretical analysis and simulation experiments show the PBR 
methods improve upon the naive approach to data-driven mean-CVaR portfolio optimization. 

From a larger perspective, the PBR approach is a new and promising way of dealing with 
estimation risk and introducing robustness to data-driven optimization, and is not restricted to the 
mean-CVaR problem. We leave investigating PBR in a general problem context for future work. 

Acknowledgements 

This research was supported in part by the NSF CAREER Awards DMS-0847647 (El Karoui), 
CMMI-0348746 (Lim), and NSF Grants CMMI-1031637 and CMMI-1201085 (Lim). The opinions, 
findings, conclusions or recommendations expressed in this material are those of the authors and do 
not necessarily reflect the views of the National Science Foundation. The authors also acknowledge 
support from an Alfred P. Sloan Research Fellowship (El Karoui), the Coleman Fung Chair in 



X ^{l-I{q))N{fisim,^. 



)+I(q){Ylp + f) 



(23) 
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Figure 5: Average of population risk vs return for solutions to ( |CVaR-emp[ ) in grey, ( |CVaR-pen[ ) in red and 
(Mark-emp) in blue under (a) Gaussian model and (b) elliptical model. Green curve denotes the population 
efficient frontier. Horizontal and vertical lines show ±f/2 std error. 
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Figure 6: (a) Distribution of returns for an equally weighted portfolio under the mixture model, (b) Average 
of population risk vs return for solutions to ( |CVaR-emp[ ) in grey, ( |CVaR-pen[ ) in red and ( |Mark-eiript in 
blue under the mixture model. Green curve denotes the population efficient frontier. Horizontal and vertical 
lines show ±f/2 std error. 



A Asymptotics of the CVaR estimator 

Setting. Let L = [Li,...,L„] be n iid observations (of portfolio losses) from a distribution 
F which is absolutely continuous, has a twice continuously differentiable pdf and a finite second 
moment. 

In this section, we prove the asymptotic distribution of the estimator CVaRn{L; f3) introduced 
in Eq. ([2]) of Sec. 2.1. First, we define a closely related CVaR estimator: 

Definition 4 (Type 1 CVaR estimator.). For j3 G (0.5,1), we define Type 1 CVaR estimator 
to be 

Cyi„(L; /3) := min (1 - e„)a + . m , i YP^^ " «)^' 

aeK n — I np I + 1 

where Sn is some constant satisfying < e„ < (n — [n/3] + 1)~ , \fnen — ?• 0. 

Now consider the following CVaR estimator, expressed without the minimization: 

Definition 5 (Type 2 CVaR estimator.). For /3 G (0.5,1), we define Type 2 CVaR estimator 
to he 



C?2„(L;/3) := J_^^La(L, > a„(/3)), 

n — I np I + 1 ^—^ 



where an.(/3) := the [n/3] -th order statistic of the sample Li, . . . , L„. 

Type 2 CVaR estimator is asymptotically normally distributed [Chen (2008)]. In the remainder 
of this section, we show that CV2n{\-] /3) is asymptotically equivalent to Cyi„(L; /3), which is in turn 
asymptotically equivalent to CVaRn{\-; (3). We then conclude CVaRn{\-; (3) is also asymptotically 
normal, converging to the same asymptotic distribution as Cy2„(L;/3). 
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Proposition 1. The solution a* = L(|-„^-|) is unique to the one- dimensional optimization problem 
min i G„(a) := (1 -£„,)« H p^— —— - ^^(Li - a)+ > , 

where En is some constant satisfying < < (??, — [n/3] + 

Proof. The expression to be minimized is a piecewise linear convex function with nodes at Li^ • • • ? -^n* 
We show that G,i(q:) has gradients of opposite signs about a single point, L(|-„^-|), hence this point 
must be the unique optimal solution. Now consider, for m G {— [n/3] + 1, . . . ,n — [n/3]}: 

A(m) = Gn{L(^^n/3]+m+l)) - Gn{L(^lnl3]+m)) 

= (1 - en)(^(rn/31+m+l) " -C' ( [n/?! +m) ) " ^ _ p^^^ _^ j ^' 



where 



2=1 



(n - [n/3] - m)(L(|-„^-|4.„+i) - L(|-„^-| 



Thus 



A(m) = (L(|-„^-|+m+i) - L(^^nl3]+m)) ^(1 - en) 



n — \n/3~\ — m 



n - [n/3] + 1 

Now A(0) > since {L^j^ii^+i) - -^^(^1)) > and (1 - e„,) > (n - [n/3])(n - [n/3] + 1)"^ by the 
restriction on and A(— 1) < since (L(|-„^-|) — L(|-„^-|_i)) > and (1 — e„) < 1 again by the 
choice of e„. Thus Gn{oi) has a unique minimum at a* = L(|-„^-|). □ 

Remark. Note if e„ = 0, then multiple solutions occur because A(— 1) = 0. 
Corollary 2. Type 1 and Type 2 CVaR estimators are related by 

C?2„(L;/3) = Cyi„(L;/3) +e„L(p„^^). 

Proof. Rewriting Type 2 CVaR estimator: 
1 

1 

= CVlniU (3) + e„,L(|-„^-|), 
where the final equality is due to Proposition [TJ □ 

We now show asymptotic normality of CVln{\-', /3). 
Lemma 6. Type 1 CVaR estimator is asymptotically normal as follows: 

"^^^"^^ (cVlnil; f3) - CVaR{Ly, /3)) AA(0, 1), (24) 
70 V / 
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where 7g = Variance[{Li — a^)l(Li > a^)], and ap = inf{a : P{Li > a) < 1 — /?}, Value-at-Risk 
of the random loss Li at level f3. 

Proof. Asymptotic normality for Type 2 CVaR estimator is proven in Chen (2008), and the result 
is immediate from invoking Slutsky's lemma on Corollary [2] and the assumption \/nen — )• 0. □ 

A.l Proof of Lemma [T] 

The asymptotic distribution of Cyai?„(L;/3) is the same as Cyi„(L;/3) because 

MCVaRn{y-\ P) - CV1„(L; = op(l). 

B Proof of Theorem 1 

Lemma 7. Consider the optimization problem 

min z~^ln 

S.t. Zi > 

Zi > 
Z~^Q.nZ < 

where q > V i, / > 0, il„ = (n — — n^^l^l^), the sample covariance operator. Suppose 

125\) is feasible with an optimal solution (x* , z*). Let Si(z) := {1 < i < n : Zi = 0}, 52(2:) := {1 < 
i < n : Zi = Cj} and V{z) := Si D S2 (i.e. V{z) is the set of indices for which Zi > max(0,Cj)j. 
Then the optimal solution z* falls into one of two cases: either Si{z*) 7^ and V{z*) = 0, or 
Si{z*) = and V{z*) / 0. 

Proof. The problem (j25p is a convex optimization problem because 0^ is a positive semidefinite 
matrix. The problem is also strictly feasible, since zq = 2maxj{cj}lri is a strictly feasible point: 
clearly, zo^i > max{0, Cj} V i and ZqQuZq = < / as 1„ is orthogonal to Qn- Thus Slater's condition 
for strong duality holds, and we can derive properties of the optimal solution by examining KKT 
conditions. 

The Lagrangian is 

^z,r]i,r]2,X) = Xz~^QnZ + (In -rji- mY z + rijc- Xf 
The KKT conditions are 

• Primal feasibility 

• Dual feasibility: rjl,rj2 > component-wise and A* > 

• Complementary slackness: 

z*7]l, = V i, (z* - Ci)7]* i = V i and A*[(z*)^J7„z* - /] = 

• First Order Condition: 

V,.£ = 2XnnZ* + (In - V*i - V*2) = (26a) 





Ci 

f 



V i 

V i 



(25) 
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By substituting for (j26ap can be written as 



^Z*--illz*)ln] = -ln + Vl+V*2- (27) 



n — 1 \ n 

Suppose Si{z*) 7^ at the optimal primal-dual point {z* ,r]l,r]2, X*). Then 3 iq £ Si{z*) such 
that z*^ = 0. The io-th component of (p7|) gives 

-{llz*) = -l + r,*+rj* (28) 



n{n — 1) 



Now suppose V{z*) 7^ at the optimal primal-dual point {z* ,r]l,r]2, X*). Then 3 jo G 
such that z*^ > max(0,Cj), r/]'^-^ = and r]2j^ = 0. The jo-th component of ([27|) gives 

2A* / . l.^T.*. 



n — 1 \ n 

which also implies A* > 0. 

Now suppose Si{z*) and V{z*) are both nonempty. Combining (p8]) and (p9]) . we arrive at the 
necessary condition 

2A* 

which is clearly a contradiction since LHS > whereas RHS < 0. Hence Si{z*) and 1^(2;*) cannot 
both be nonempty. □ 



(1^*) = -1, (29) 



B.l Proof of Theorem 1 



Proof. Clearly, (jCVaR-relaxP is a relaxation of ( |CVaR-pen ): the components of the variable z in 



(|CVaR-relax[) are relaxations of max(0, —w~^Xi — a). Thus the two problem formulations are equiv- 
alent if at optimum, Zi = max(0, —w'^Xi — a) V z = 1, . . . , n for (jCVaR-relaxp . 



Let (a* , w* , z* , i^l , 1^2 ^ 7]l, 7]2, XI, X2) he the primal-dual optimal point for (jCVaR-relaxP and (jCVaR-relax-dp . 
Our aim is to show that V{z*), the set of indices for which z* > max(0, — — a), is empty. 
Suppose the contrary. Then by Lemma [71 Si{z*), the set of indices for which z* = 0, is empty. 
This means z* > 0\/ i and r/^ ^ = V i by complementary slackness. 

Now consider the sub-problem for a fixed t]2 in the dual problem (|CVaR-relax-dp : 

max - + 'i]2)^Um + m)- (30) 

r?i:r?i>0 

As In is orthogonal to 0^, and On is positive semidefinite, the optimal solution is of the form 
r/i = al„ — 7^2) where a is any constant such that a > maxj(r72,j), with a corresponding optimal 
objective 0. Hence, bearing in mind the constraints ri2 > and rjjln = 1 in (|CVaR-relax-dp . rji = 
is one of the optimal solutions iff' rj2 = In/n. Thus if r/2 7^ ^n/n-, we get a contradiction. Otherwise, 
we can force the dual problem to find a solution with ryi 7^ by introducing an additional constraint 
"Hi^n > ^ for some constant < 6 <^ I. □ 
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C Details of asymptotic theory 
C.l Proof of Theorem H 

Proof. By uniqueness oIOq^Xi, A2) and Assumption 1 (and compactness arguments), for every e > 0, 
there exists r] > such that 

||^n(Ai,A2)-^o(Ai,A2)||2 >e =^ M{e^;Xi,X2)-M{eo;Xi,X2)>V- 

Thus if we can show the probabihty of the event {Af Ai, A2) — M{9o; Ai, A2) > rj} goes to zero 
for every e > 0, then we have consistency. 
We also have 

M„(4;Ai,A2) < M„(0o;Ai,A2) + op(l) = Af(0o;Ai,A2) + op(l), W 

the first inequahty because ^n.(Ai,A2) is a near-minimizer of M„, and the second equahty by the 
Weak Law of Large Numbers (WLLN) on M„(^o; Ai, A2). 
Thus 

< M(4;Ai,A2)-M(0o;Ai,A2) 

= [M(4,;Ai,A2) -M„,(4;Ai,A2)] + [M„(4,; Ai, A2) - M„(0o; Ai, A2)] + [M„(0o; Ai, A2) - M(eo; Ai, 
< M(4; Ai, A2) - Mn{dn; Ai, A2) + op(l), 

because the second term in [ ] is op(l) by (*), and the last term in [ ] is op(l) by WLLN. We are 

left to prove |M„(0„,; Ai, A2) — M(^„; Ai, A2)| — )• 0. At first glance, one may consider invoking the 
WLLN again. However, as 0n.(Ai, A2) is a random sequence of vectors that changes for every n, 
we cannot apply the WLLN which is a pointwise result (i.e. for each fixed 6 £ Q), and we need to 
appeal to the stronger ULLN. 

Case I: Ai = A2 = 0. To show ULLN for the original objective, we show that J^i is a Lipschitz 
class of functions, hence ]{e,Ti, Lr{P)) for every e > 0. Now 9 1— mQ{x) = a + (1 — /5)~^(— a — 
w^x — x)^ is clearly differentiable at 9q for all x G W. Furthermore, 



Vgmg{x) = 

where I{x) := I(— a — WqX — x > 0), hence 



-1 
L^x 



m(x) := max(l, I |L'''x| loo) (31) 

is an upper bound on | |Ve7Tie(a;)| |oo and is independent of 9. Thus \mQ^{x) — mg^{x)\ < m{x)\\9i — 
02II2 for all 9i,92 G [—K,K]^~^p, and together with Assumption 2 (here a weaker assumption that 
X has finite second moment suffices), J^i is a Lipschitz class. 

Case II: Ai > 0, A2 > 0, Ai,A2 not both zero. Corollary 3.5 in Arcones and Gine (1993) says 
that ULLN also holds for the penalized objective if ]{e,T2, L2{P x P)) < 00 for every e > 0. 
Let us now show that J-2 is also a Lipschitz class of functions. Again, it is clear that 



I—)- m 



(e;Ai,A2)(^i'^2) = I [meixi) + meix2)] + ^[(^i + Lv)^ {xi - X2)]'^ + ^{zg{xi) - ze{x2)y' 
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is differentiable at 6*0 for all {xi,X2) eW xW. Also for all 9 e [-K,K]^+p, 

Vg^[{wi + Lv^ {Xl - X2)]'^ = Xi{xi - X2){XI - X2)~^ {wi + Lv) 



2 

f 

2 



Ne^iiwi + Lv^ixi - X2)]^||oo < Ai||xi - j;2||Llki + -^^^lloo < XiC{K)\\xi - ^2"^ 



00 



^^e^izeixi) - zg{x2)f = X2{zd{xi) - ze{x2)) 



2 



for some constant C{K) dependent on K, and 

-I{xi) + I{x2) 

-LJxiI{xi) + L^X2I{X2) 



and 



|-26»(a;i)| = \ — {a — Wq xi — xi) 



< \a — WqXi — v'^ xi\ < K + \wq xi\ + A'|e^xi| 



NoY^^sixi) - zg{x2)f\\oo < X2\ze{xi) - Z0{x2)\irh{xi) + m{x2)) 

m as defined in Eq. ([3T]) 
< A2C'(iv:)(||xi||oo + \\x2\\oo){'ni{xi) + 771(2:2)), 
for some constant C'{K) dependent on 



hence 



"^(Ai,A2)(^i'^2) := ]^[m{xi)+m{x2)]+XiC{K)\\xi-X2\\lo+^2C\K){^^^ 

(32) 

is an upper bound on HVem^^.-^^ x^'^{xi-,X2)\\oo that is independent of 6. Thus 

l"^Si;Ai,A2)(^l'^2) -m[^2;Ai,A2)(3;i,a;2)| < m[^,,A2) (^^l > 2^2) 1 1^1 - ^2 1 b, 

and together with Assumption 2, J^2 is a Lipschitz class. □ 

C.2 Proof of Theorem [3] 

In what follows, we suppress the dependence on Ai, A2 for notational convenience. 

Proof. The proof parallels the proof of Theorem 5.23 of Van der Vaart (2000). Let us assume for 
now that 

1. For every given random sequence /i„ that is bounded in probability, 

Un[^/^«+,„/^ - O - hlml] ^ 0, (*) 

and 

2. V^(^„-0o) = Op(l). 

Since 6 1— )• M[0) is twice-differentiable, and ^ 0M{9)\Q=gf^ = by first-order condition, we can 
rewrite Eq. Q to get 



n 



\ -1 1 



\hlVg^hn + hZGn[ml^] + Op(l), 



28 



where we use the fact, from Hoeffding decomposition, 

1 " 

= - ErhliXi)] + Op{l) = Gn[ml] + Op(l), 

i=l 

with rhg as in the statement of the theorem. 

The above statement is vahd for both hn = \/n{9n — ^0) and for hn = — Fg~^G„,m^^. Upon 
substitution, we obtain 

where the inequahty is from the definition of 9n = + hn/^/n as a near-minimizer. 
Taking the difference and completing the square, we get 

]^{K + V,'-^^Gnm];)''Ve,{K + V,~^Gnm\,) + Op(l) < 0, 

and because Vg^ is nonsingular, the quadratic form on the left must converge to zero in probability. 
The same must be true for \\hn + yQ~^G„m^^ 1 12. 

To complete the proof, we need to show Q and \fn{Qn — ^0) = Op(l) hold. 




Proof of Q. 

Let jh ■= V^(.i^^^_^_h/^ ~ "^0^) ~ ^"""^el)- we are considering only sequences hn that are 
bounded in probability, it suffices to show sup;j.||;j||2<i |U„[//j]| goes to zero in probability. Again 
by Hoeffding decomposition, for any given random sequence hn that is bounded in probability, 
Un[//i„] = 'S'nifh ] + ^n{hn)) where is the first term in the Hoeffding decomposition of Un[/ft,] 
given by 

ml{xi) = 2ExJm^(xi,X2)] -Exi,xJm^(Xi,^2)], 

and TTig as defined in the statement of the theorem. According to Lemma 19.31 in Van der Vaart 
(2000), if P2 '■= {"^0 : ^ ^ [-K,KY+p} is a Lipschitz class of functions, 

sup |G„[/ij|4o. 

h:\\h\\2<l 

Now by Assumption 2 that Xj's are iid continuous random vectors with finite fourth moment. 
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9 I—)- mg{x) is differentiable at for all x G M. Further, by triangle inequality, 

\ml,ix) - ml^{x)\ < 2Kx,\m'il^ix,X2)-m^^{x,X2)\+Ex,,x,\meiXi,X2)-m^{Xi,X2)\ 
< m\x)\\ei - 02112, 

where m-^(2;) = (llKxilfn^ {x, X2)\ +'Ex-i,X2\'^^ i^i: ^2)]), rnF as in Eq. (p2]l . Since Xj's have finite 
fourth moment, E[m-'^(Xi)^] < 00 and thus J-2 is a Lipschitz class. 

Now we are left to show sup/j.||/j||2<i|-E'n(^)| — ^ 0. Let '■= {fh ■ H^lb < !}• According to 

Theorem 4.6 of Arcones and Gine (1993), sup^.||^||2<i|-E'n(^)| — ?■ if J-^ has a finite, integrable 
envelope function and both and J"^ := {fl : \\h\\2 < 1} are Lipschitz classes about h = 0. Th 
has a finite, integrable envelope function F{x\^X2) = fny {x\^X2) + ||"T'0q(xi,X2)||2 < 00 due to 
Assumption 2 and the Lipschitz property of rr^\ 

\ih\ < \V^{m'^^+h/V^ - ^^eo) - h^^0o\ 
< (m^ + ||m^,J|2)||/i||2. 

It is now straight-forward to check that J-h is a Lipschitz class about h = 0, and J-"^ also, because 
it inherits the key properties from T^. 



Proof of ^{On - Bo) = Op{l). 
The proof of ^/n{6n{0,0) — 9q{0,0)) = Op(l) can be found in Theorem 5.52 and Corollary 5.53 of 
Van der Vaart (2000), and is a standard M-estimation result. In essence, Theorem 5.52 shows that, 
under some regularity conditions, P{^/n\\9n{0,0) — 6o{0,0)\\2 > a) can be bounded by P(|Gn[?7i0]| > 
a') = P{^/n\Mn{0) — M{6)\ > a'), which is shown to go to zero via some maximal inequalities. 
Corollary 5.53 shows that the Lipschitz condition on {mg : 9 G [—K, KY~^p} is sufficient to satisfy 
the regularity conditions of the theorem. 

We can extend Theorem 5.52 to show -y/n(^„(Ai, A2) — 9q{Xi, X2)), Ai, A2 > not both zero, by 
bounding P(V^||(9„(Ai,A2) -6'o(Ai,A2)||2 > a) by 

P(|U„K]| > a') < P{\Gn[ml]\ + \EU9)\ > a'), 

where E"^ is the remainder term after first-order projection of the U-process U„[m^]. It remains to 
show that for every sufficiently small 6 > 0, 

sup |E;(0)|4o, (33) 
e:\\e-eo\\2<s 

p . . 

which can be proven using the same reasoning for sup \En{h)\ — t- in the proof of Q. □ 

h:\\h\\2<l 

C.3 Computation of key statistics 

Given the distribution for X, both Aq = Agg{0,0) and Bq = ^^^(O, 0) are computable. The 
lemma below computes the key quantities that constitute Aq and Bq when X ~ S). 

Lemma 8. Suppose X J\f{fi,Tj), and zg{X) = —a — X J\f{fii,(j1), where iJ,i = —ai^"^{P) 
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and a I = w^Y^w. Then 

E[max(ze(X),0)] 



E[l7XI(Zi > 



P{ze{X) = ^) 



27r(7i 



1 



exp ( --^~^{l3f 



exp (^-]^^~\pf^ - ai{l - 
(1 - (3){Ljfi - ) - ^^IE[max(ze(X),0)] 



E[LjX\Zi = 0] 



k[lJxlJxi{Zi > 



e[lJxlJx\Zi = 0] 



Ljf,-<^-\f3)^ 

^(^(/"i, {Lj + Li^fJ-, 0-1, 0-2, -(Lj + Li)'^T.wi) 

- gifJ-i, {Lj - Li)'^ n,(7i,a2, -{Lj - Li^Ywi)) 
-{h{fii, {Lj + Li^ fi, (71,(72, -{Lj + L/)^St(;i) 

- {Lj - Li)~^ ii,ai,a2, -{Lj - Li)~^T.wi)) 



where 



5(^,^2, 0-1, 0-2, 0-12) = (1 - /3) [/i2 + 0-2] + Pocri2 

h{fii,/J.2,(7i,a2,ai2) = = (/U2 + — ^>^^(/3))^ + 

0"! 



,2 



12 



2 • 

1 



Proof. We use the fact that if Zi ~ M{fii,ai) and Z2 ~ A/'(/U2, o"2), 

Z2IZ1 = A/'(^2 + (7l2/crl{Zi - m), o\ - CTi2/0"i), 

where a\2 = Cov{Zi, Z2). 

• Terms involving only LJ X. 

Note that from (gOD, E[Z2\Zi = 0] = /i2 - ^/ii- Let Z2 = LJX, and recall that E(LJX) 

and E(Zi) = — (Ti#~^(/3). Also, note that ai2 = —LJT,w. After some algebra, we get (j37|) . 
Since we know the distribution of Z2\Zi, we have 

E[Z2l(Zi>0)] = E[I(Zi>0)af2 + ^(^i-Aii))] 

(1 - ^){fi2 - + ^E[ZiI(Zi > 0)] 



0"t 



^ , LJTiW LJTiW 
(1 - /3)(l7m - ^" (/3)^ ) - ^^E[max(Z; 



0-1 



1, ' 



• Terms involving LJ XLJ X. 

To compute E[LJXL7XI(Zi > 0)] and K[LJXLJX\Zi = 0], first note that 



e[lJxlJxi{Zi > 0)] 



-E 



[{L]X + L; X)^ - (L; X - > 0) 



T x-\2i 
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and similarly 



E[lJxlJx\Zi = 0] = -E[ {lJx + Ljxf - {L]X - Ljxf |Zi = 0] . 



Hence it is sufficient to first find expressions for E[Z2l{Zi > 0)] and M[Z2\Zi = 0] for some normal 
Z2, then apply the resulting formulae to Z2 = {Lj it Li)~^ X. This results in = {Lj it Li)~^ fx, 

= -{Lj ± LiYT^w and al = {Lj ± Liyj:{Lj ± Li). 

From tower property and the conditional distribution of Z'^Z\^ 



E[Z^I{Zi > 0)] = E[I{Zi > 0) 
By simple computations, 



'712 



0-7 



0-1 



exp(-/ii/(2cJi)) = alfzi{0) = crfpo , and 



E[(Zi - fii)Hz,>o] = al{fiipo + (1 - /?)) . 
Now fii/ai = -<I)~^(/3), and 

E[Z|lzi>o] = (1 - /3) [/i2 + ^2] +P0 

= (1-/3) [/ii + ai] +pofTi2 
:= 5(^1 M2, 0-1, 0-2, 0-12) 

Similarly, 



+ 2(Tl2/i2 



_ci>-i(/3)^ + 2^2 

0"! 



E[Z||Zi = 0] = (/.2 - ^Mi)' + - ^ = (/X2 + —<^~HP)f + al-'^:= h{^l,,^i2, Ti, cT2,ai2) 



□ 
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